Fast Multi-stage Submodular Maximization
نویسندگان
چکیده
Motivated by extremely large-scale machine learning problems, we introduce a new multistage algorithmic framework for submodular maximization (called MultGreed), where at each stage we apply an approximate greedy procedure to maximize surrogate submodular functions. The surrogates serve as proxies for a target submodular function but require less memory and are easy to evaluate. We theoretically analyze the performance guarantee of the multi-stage framework and give examples on how to design instances of MultGreed for a broad range of natural submodular functions. We show that MultGreed performs very closely to the standard greedy algorithm given appropriate surrogate functions and argue how our framework can easily be integrated with distributive algorithms for further optimization. We complement our theory by empirically evaluating on several real-world problems, including data subset selection on millions of speech samples where MultGreed yields at least a thousand times speedup and superior results over the state-of-the-art selection methods.
منابع مشابه
Fast Multi-Stage Submodular Maximization: Extended version
Motivated by extremely large-scale machine learning problems, we introduce a new multistage algorithmic framework for submodular maximization (called MultGreed), where at each stage we apply an approximate greedy procedure to maximize surrogate submodular functions. The surrogates serve as proxies for a target submodular function but require less memory and are easy to evaluate. We theoreticall...
متن کاملLearning Sparse Combinatorial Representations via Two-stage Submodular Maximization
We consider the problem of learning sparse representations of data sets, where the goal is to reduce a data set in manner that optimizes multiple objectives. Motivated by applications of data summarization, we develop a new model which we refer to as the two-stage submodular maximization problem. This task can be viewed as a combinatorial analogue of representation learning problems such as dic...
متن کاملFast Constrained Submodular Maximization: Personalized Data Summarization
Can we summarize multi-category data based on user preferences in a scalable manner? Many utility functions used for data summarization satisfy submodularity, a natural diminishing returns property. We cast personalized data summarization as an instance of a general submodular maximization problem subject to multiple constraints. We develop the first practical and FAst coNsTrained submOdular Ma...
متن کاملMulti-document Summarization via Budgeted Maximization of Submodular Functions
We treat the text summarization problem as maximizing a submodular function under a budget constraint. We show, both theoretically and empirically, a modified greedy algorithm can efficiently solve the budgeted submodular maximization problem near-optimally, and we derive new approximation bounds in doing so. Experiments on DUC’04 task show that our approach is superior to the bestperforming me...
متن کامل